Data

I chose the cardox data from the astsa package. This data is from 1958 to 2018 of Monthly Carbon Dioxide Levels at Mauna Loa.

data(
  list = "cardox",
  package = "astsa"
)

Part A

First I created a plot of the time series with a horizontal line for the sample mean of the data and a trend line from regression.

sample_mean <- mean(cardox)

lm_df <- lm(
  formula = cardox~time(cardox)
)
summary(
  object = lm_df
)
## 
## Call:
## lm(formula = cardox ~ time(cardox))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.8154 -2.9392 -0.5391  2.6590 11.1524 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.732e+03  1.734e+01  -157.6   <2e-16 ***
## time(cardox)  1.552e+00  8.720e-03   178.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.129 on 727 degrees of freedom
## Multiple R-squared:  0.9776, Adjusted R-squared:  0.9775 
## F-statistic: 3.168e+04 on 1 and 727 DF,  p-value: < 2.2e-16
astsa::tsplot(
  x = cardox,
  ylab = "Carbon Dioxide Levels at Mauna Loa",
  col = 4,
  lwd = 1,
  main = "Cardox Data"
)
abline(
  h = sample_mean,
  col = 2
) 
abline(
  reg = lm_df,
  col = 1
) 
legend("topleft", legend=c("Original Data", "Sample Mean", "Simple Linear Regression"), col=c("blue", "red", "black"), lwd=1:1:1, bty = "n")

Using the decompose function in R we can see the original time series, trend, seasonality, and randomness in the data.

plot(decompose(cardox))

As we can see from the plots that there is a clear increasing trend and there also appears to be seasonality.

Since the data has an increasing trend and has seasonality I will do 1 degree of non-seasonal differencing and 1 degree of seasonal differencing and see how the data looks after.

diff_cardox <- diff(cardox, differences = 1)

astsa::acf2(
  series = diff_cardox
)

##      [,1]  [,2]  [,3]  [,4]  [,5] [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF   0.7  0.24 -0.21 -0.46 -0.51 -0.5 -0.51 -0.45 -0.19  0.24  0.71  0.92
## PACF  0.7 -0.48 -0.31 -0.06 -0.12 -0.4 -0.54 -0.51 -0.35 -0.22  0.18  0.39
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.71  0.24 -0.20 -0.45 -0.51 -0.49 -0.50 -0.44 -0.20  0.25  0.70  0.91
## PACF  0.17 -0.01  0.02  0.04  0.03  0.11  0.03 -0.02 -0.07 -0.05  0.02  0.12
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF   0.70  0.23 -0.20 -0.45 -0.50 -0.49 -0.49 -0.43 -0.19  0.25  0.69  0.89
## PACF  0.05 -0.11  0.03  0.00  0.03  0.02  0.02  0.08  0.03  0.00  0.05  0.05
##      [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF   0.68  0.22 -0.20 -0.44 -0.49 -0.48 -0.48 -0.42 -0.18  0.24  0.69  0.87
## PACF  0.02 -0.06  0.01  0.04  0.05 -0.01  0.02  0.04  0.00 -0.05  0.06  0.03
diff2_seasonal_cardox  = diff(
  x = diff_cardox,
  lag = 12
)

astsa::acf2(
  series = diff2_seasonal_cardox
)

##       [,1]  [,2]  [,3]  [,4] [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12]
## ACF  -0.32  0.01 -0.08 -0.02 0.06 -0.02  0.00 -0.01 0.11 -0.07  0.16 -0.46
## PACF -0.32 -0.11 -0.12 -0.10 0.01 -0.01 -0.01 -0.01 0.12  0.01  0.18 -0.40
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.12  0.06 -0.02  0.04 -0.08  0.05 -0.04  0.00 -0.06  0.03 -0.01 -0.02
## PACF -0.17 -0.02 -0.07 -0.04 -0.04  0.02 -0.01 -0.03  0.01 -0.01  0.07 -0.27
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF   0.02 -0.07  0.08 -0.03  0.04 -0.02 -0.03  0.03  0.00  0.05 -0.02  0.02
## PACF -0.14 -0.09 -0.02 -0.02 -0.01  0.04 -0.05 -0.03 -0.01  0.04  0.06 -0.18
##      [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF   0.03  0.00 -0.05 -0.01  0.02 -0.02  0.01  0.02 -0.02 -0.02  0.05 -0.03
## PACF -0.02 -0.05 -0.04 -0.06 -0.02 -0.02 -0.12  0.00 -0.05  0.00  0.09 -0.12
astsa::tsplot(
  x = cbind(
    cardox, diff_cardox, diff2_seasonal_cardox
  ),
  main = "Time series plots of differenced Cardox data"
)

We can see after 1 degree of non-seasonal differencing and 1 degree of seasonal differencing that the time series plot looks stationary. We can also see that the ACF cuts off at 1 while the PACF seems to trail off with some indication that there is underlying seasonal trend.

Part 2

A reasonable seasonal trend for my data is annual. Therefore, my seasonal period is year and the subinterval is month.

First we can visualize the data with subinterval as the horizontal axis, and the observed value as the vertical axis. The first plot is the original data, then non-seasonal differencing, then lastly seasonal differencing.

monthplot(
  x = cardox
)

monthplot(
  x = diff_cardox
)

monthplot(
  x = diff2_seasonal_cardox
)

Then we can visualize the data with the seasonal period as the horizontal axis, and observed value as the vertical axis. Here we have the orignal data.

cardox_monthly_list <- split(cardox, f = cycle(cardox))

# Set up the layout for the subplots
par(mfrow = c(3, 4))

# Loop through each month and create a separate plot
for (i in 1:12) {
  # Create a time series object for the current month
  monthly_ts <- cardox_monthly_list[[i]]
  
  # Plot the monthly time series with the month names as the x-axis labels
  astsa::tsplot(
    x = monthly_ts,
    xlab = "Year",
    ylab = "CO2 Levels",
    main = month.abb[i]
  )
}

Here the plots are updated after performing non-seasonal differencing and seasonal differencing on the data.

diff2_cardox_monthly_list <- split(diff2_seasonal_cardox, f = cycle(diff2_seasonal_cardox))

# Set up the layout for the subplots
par(mfrow = c(3, 4))

# Loop through each month and create a separate plot
for (i in 1:12) {
  # Create a time series object for the current month
  monthly_ts <- diff2_cardox_monthly_list[[i]]
  
  # Plot the monthly time series with the month names as the x-axis labels
  astsa::tsplot(
    x = monthly_ts,
    xlab = "Year",
    ylab = "Differenced CO2 Levels",
    main = month.abb[i]
  )
}

PART 3

Looking at the ACF and PACF I fit six seasonal autoregressive integrated models.

ARIMA(1,1,0) x seasonal ARIMA(1,1,0)[12]

fit1 <- astsa::sarima(cardox, p = 1, d = 1, q = 0,P = 1, D = 1, Q = 0, S = 12)
## initial  value -0.833682 
## iter   2 value -1.016921
## iter   3 value -1.016976
## iter   4 value -1.016978
## iter   4 value -1.016978
## iter   4 value -1.016978
## final  value -1.016978 
## converged
## initial  value -1.011038 
## iter   2 value -1.011050
## iter   3 value -1.011051
## iter   3 value -1.011051
## iter   3 value -1.011051
## final  value -1.011051 
## converged

fit1
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1     sar1
##       -0.3291  -0.4811
## s.e.   0.0353   0.0331
## 
## sigma^2 estimated as 0.1318:  log likelihood = -292.05,  aic = 590.1
## 
## $degrees_of_freedom
## [1] 714
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1   -0.3291 0.0353  -9.3127       0
## sar1  -0.4811 0.0331 -14.5234       0
## 
## $AIC
## [1] 0.8241552
## 
## $AICc
## [1] 0.8241787
## 
## $BIC
## [1] 0.8433186

ARIMA(1,1,0) x seasonal ARIMA(2,1,0)[12]

fit2 <- astsa::sarima(cardox, p = 1, d = 1, q = 0,P = 2, D = 1, Q = 0, S = 12)
## initial  value -0.832530 
## iter   2 value -1.027055
## iter   3 value -1.074395
## iter   4 value -1.079637
## iter   5 value -1.079654
## iter   6 value -1.079659
## iter   6 value -1.079659
## iter   6 value -1.079659
## final  value -1.079659 
## converged
## initial  value -1.067746 
## iter   2 value -1.067775
## iter   3 value -1.067775
## iter   4 value -1.067775
## iter   4 value -1.067775
## iter   4 value -1.067775
## final  value -1.067775 
## converged

fit2
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1     sar1     sar2
##       -0.3400  -0.6468  -0.3403
## s.e.   0.0352   0.0361   0.0365
## 
## sigma^2 estimated as 0.1172:  log likelihood = -251.43,  aic = 510.87
## 
## $degrees_of_freedom
## [1] 713
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1   -0.3400 0.0352  -9.6453       0
## sar1  -0.6468 0.0361 -17.9379       0
## sar2  -0.3403 0.0365  -9.3240       0
## 
## $AIC
## [1] 0.7134998
## 
## $AICc
## [1] 0.7135469
## 
## $BIC
## [1] 0.7390511

ARIMA(3,1,0) x seasonal ARIMA(1,1,0)[12]

fit3 <- astsa::sarima(cardox, p = 3, d = 1, q = 0,P = 1, D = 1, Q = 0, S = 12)
## initial  value -0.833327 
## iter   2 value -1.018565
## iter   3 value -1.027807
## iter   4 value -1.030047
## iter   5 value -1.030079
## iter   6 value -1.030080
## iter   6 value -1.030080
## iter   6 value -1.030080
## final  value -1.030080 
## converged
## initial  value -1.023892 
## iter   2 value -1.023969
## iter   3 value -1.023969
## iter   3 value -1.023969
## iter   3 value -1.023969
## final  value -1.023969 
## converged

fit3
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      ar2      ar3     sar1
##       -0.3786  -0.1544  -0.1146  -0.4804
## s.e.   0.0372   0.0394   0.0372   0.0331
## 
## sigma^2 estimated as 0.1284:  log likelihood = -282.8,  aic = 575.6
## 
## $degrees_of_freedom
## [1] 712
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1   -0.3786 0.0372 -10.1897  0.0000
## ar2   -0.1544 0.0394  -3.9177  0.0001
## ar3   -0.1146 0.0372  -3.0800  0.0021
## sar1  -0.4804 0.0331 -14.5082  0.0000
## 
## $AIC
## [1] 0.8039063
## 
## $AICc
## [1] 0.8039848
## 
## $BIC
## [1] 0.8358454

ARIMA(0,1,1) x seasonal ARIMA(0,1,1)[12]

fit4 <- astsa::sarima(cardox, p = 0, d = 1, q = 1,P = 0, D = 1, Q = 1, S = 12)
## initial  value -0.829273 
## iter   2 value -1.063862
## iter   3 value -1.100292
## iter   4 value -1.125518
## iter   5 value -1.132304
## iter   6 value -1.132783
## iter   7 value -1.134864
## iter   8 value -1.134927
## iter   9 value -1.134931
## iter   9 value -1.134931
## iter   9 value -1.134931
## final  value -1.134931 
## converged
## initial  value -1.152690 
## iter   2 value -1.156302
## iter   3 value -1.157090
## iter   4 value -1.158203
## iter   5 value -1.158305
## iter   6 value -1.158310
## iter   7 value -1.158310
## iter   7 value -1.158310
## iter   7 value -1.158310
## final  value -1.158310 
## converged

fit4
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1     sma1
##       -0.3875  -0.8641
## s.e.   0.0390   0.0192
## 
## sigma^2 estimated as 0.09634:  log likelihood = -186.61,  aic = 379.22
## 
## $degrees_of_freedom
## [1] 714
## 
## $ttable
##      Estimate     SE  t.value p.value
## ma1   -0.3875 0.0390  -9.9277       0
## sma1  -0.8641 0.0192 -45.1205       0
## 
## $AIC
## [1] 0.5296369
## 
## $AICc
## [1] 0.5296604
## 
## $BIC
## [1] 0.5488003

ARIMA(1,1,1) x seasonal ARIMA(1,1,1)[12]

fit5 <- astsa::sarima(cardox, p = 1, d = 1, q = 1,P = 1, D = 1, Q = 1, S = 12)
## initial  value -0.833682 
## iter   2 value -1.049183
## iter   3 value -1.106150
## iter   4 value -1.144395
## iter   5 value -1.153421
## iter   6 value -1.154618
## iter   7 value -1.155168
## iter   8 value -1.155888
## iter   9 value -1.157266
## iter  10 value -1.157866
## iter  11 value -1.157971
## iter  12 value -1.157987
## iter  13 value -1.157987
## iter  14 value -1.157988
## iter  15 value -1.157988
## iter  16 value -1.157989
## iter  17 value -1.157989
## iter  18 value -1.157989
## iter  18 value -1.157989
## iter  18 value -1.157989
## final  value -1.157989 
## converged
## initial  value -1.157964 
## iter   2 value -1.159201
## iter   3 value -1.160147
## iter   4 value -1.160440
## iter   5 value -1.160506
## iter   6 value -1.160730
## iter   7 value -1.160796
## iter   8 value -1.160805
## iter   9 value -1.160806
## iter   9 value -1.160806
## iter   9 value -1.160806
## final  value -1.160806 
## converged

fit5
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1     sar1     sma1
##       0.1941  -0.5578  -0.0008  -0.8647
## s.e.  0.0953   0.0813   0.0427   0.0211
## 
## sigma^2 estimated as 0.09585:  log likelihood = -184.82,  aic = 379.65
## 
## $degrees_of_freedom
## [1] 712
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.1941 0.0953   2.0364  0.0421
## ma1   -0.5578 0.0813  -6.8645  0.0000
## sar1  -0.0008 0.0427  -0.0185  0.9852
## sma1  -0.8647 0.0211 -40.9625  0.0000
## 
## $AIC
## [1] 0.5302324
## 
## $AICc
## [1] 0.530311
## 
## $BIC
## [1] 0.5621715

ARIMA(1,1,1) x seasonal ARIMA(2,1,1)[12]

fit6 <- astsa::sarima(cardox, p = 1, d = 1, q = 1,P = 2, D = 1, Q = 1, S = 12)
## initial  value -0.832530 
## iter   2 value -1.063340
## iter   3 value -1.132522
## iter   4 value -1.144263
## iter   5 value -1.158735
## iter   6 value -1.161729
## iter   7 value -1.165939
## iter   8 value -1.168425
## iter   9 value -1.169703
## iter  10 value -1.170780
## iter  11 value -1.171582
## iter  12 value -1.172552
## iter  13 value -1.172683
## iter  14 value -1.172791
## iter  15 value -1.172800
## iter  16 value -1.172899
## iter  17 value -1.172937
## iter  18 value -1.172952
## iter  19 value -1.172953
## iter  20 value -1.172953
## iter  20 value -1.172953
## iter  20 value -1.172953
## final  value -1.172953 
## converged
## initial  value -1.160319 
## iter   2 value -1.160958
## iter   3 value -1.161088
## iter   4 value -1.161134
## iter   5 value -1.161265
## iter   6 value -1.161279
## iter   7 value -1.161281
## iter   8 value -1.161285
## iter   9 value -1.161288
## iter  10 value -1.161289
## iter  11 value -1.161289
## iter  11 value -1.161289
## iter  11 value -1.161289
## final  value -1.161289 
## converged

fit6
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1     sar1     sar2     sma1
##       0.1966  -0.5616  -0.0098  -0.0357  -0.8558
## s.e.  0.0948   0.0807   0.0444   0.0429   0.0249
## 
## sigma^2 estimated as 0.09574:  log likelihood = -184.48,  aic = 380.95
## 
## $degrees_of_freedom
## [1] 711
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.1966 0.0948   2.0746  0.0384
## ma1   -0.5616 0.0807  -6.9606  0.0000
## sar1  -0.0098 0.0444  -0.2205  0.8256
## sar2  -0.0357 0.0429  -0.8329  0.4052
## sma1  -0.8558 0.0249 -34.4245  0.0000
## 
## $AIC
## [1] 0.5320593
## 
## $AICc
## [1] 0.5321773
## 
## $BIC
## [1] 0.5703862
fit6 <- astsa::sarima(cardox, p = 0, d = 1, q = 3,P = 0, D = 1, Q = 1, S = 12)
## initial  value -0.829273 
## iter   2 value -1.067982
## iter   3 value -1.105678
## iter   4 value -1.133375
## iter   5 value -1.139032
## iter   6 value -1.139763
## iter   7 value -1.140711
## iter   8 value -1.140792
## iter   9 value -1.140798
## iter  10 value -1.140799
## iter  10 value -1.140799
## final  value -1.140799 
## converged
## initial  value -1.157489 
## iter   2 value -1.161065
## iter   3 value -1.161857
## iter   4 value -1.162557
## iter   5 value -1.162659
## iter   6 value -1.162701
## iter   7 value -1.162702
## iter   8 value -1.162702
## iter   8 value -1.162702
## iter   8 value -1.162702
## final  value -1.162702 
## converged

fit6
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1      ma2      ma3     sma1
##       -0.3691  -0.0231  -0.0752  -0.8661
## s.e.   0.0379   0.0413   0.0387   0.0189
## 
## sigma^2 estimated as 0.09547:  log likelihood = -183.47,  aic = 376.93
## 
## $degrees_of_freedom
## [1] 712
## 
## $ttable
##      Estimate     SE  t.value p.value
## ma1   -0.3691 0.0379  -9.7284  0.0000
## ma2   -0.0231 0.0413  -0.5594  0.5761
## ma3   -0.0752 0.0387  -1.9451  0.0522
## sma1  -0.8661 0.0189 -45.9099  0.0000
## 
## $AIC
## [1] 0.5264396
## 
## $AICc
## [1] 0.5265182
## 
## $BIC
## [1] 0.5583787

Part 4

The first 3 fits have spikes in the ACF of Residuals and also have bad p-values for Ljung-Box statistic. Fits 4, 5 and 6 all are good models. However, fit 6 is the best model out of the rest with the lowest AIC and BIC values. We can also see from fit 6 model that all the plots have good results. The standardized residual plot shows no pattern and the ACF of residuals doesnt have any significant values. We can also see from the Q-Q plot that its approximately normal. Lastly, the p-values for Ljung Box statistic show no sign of autocorrelation.

Part 5

Using the model selected above we can predict the future values for the next year.

astsa::sarima.for(
  xdata = cardox,
  n.ahead = 12,
  p = 0,d = 1,q = 3,
  P = 0,D = 1,Q = 1,
  S = 12
)

## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2018                                                                        
## 2019 410.4176 411.0685 412.0005 413.4635 414.1212 413.3836 411.6374 409.5831
##           Sep      Oct      Nov      Dec
## 2018                            409.2518
## 2019 408.1255 408.4259 410.0283         
## 
## $se
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2018                                                                      
## 2019 0.3653471 0.4107911 0.4425241 0.4721290 0.4999840 0.5263670 0.5514893
##            Aug       Sep       Oct       Nov       Dec
## 2018                                         0.3089864
## 2019 0.5755161 0.5985791 0.6207860 0.6422254

To assess the quality of the model, we can visually inspect the fit of the model to the original data. The plot above shows that the model fits the data reasonably well, capturing both the overall trend and the seasonal fluctuations.We can also see that the standard errors of the prediction are low and relatively consistent to one another. This is a good indication that the model is a good quality model.